23 research outputs found

    Understanding Gradient Descent on Edge of Stability in Deep Learning

    Full text link
    Deep learning experiments by Cohen et al. [2021] using deterministic Gradient Descent (GD) revealed an Edge of Stability (EoS) phase when learning rate (LR) and sharpness (i.e., the largest eigenvalue of Hessian) no longer behave as in traditional optimization. Sharpness stabilizes around 2/2/LR and loss goes up and down across iterations, yet still with an overall downward trend. The current paper mathematically analyzes a new mechanism of implicit regularization in the EoS phase, whereby GD updates due to non-smooth loss landscape turn out to evolve along some deterministic flow on the manifold of minimum loss. This is in contrast to many previous results about implicit bias either relying on infinitesimal updates or noise in gradient. Formally, for any smooth function LL with certain regularity condition, this effect is demonstrated for (1) Normalized GD, i.e., GD with a varying LR ηt=ηL(x(t))\eta_t =\frac{\eta}{\| \nabla L(x(t)) \|} and loss LL; (2) GD with constant LR and loss LminxL(x)\sqrt{L- \min_x L(x)}. Both provably enter the Edge of Stability, with the associated flow on the manifold minimizing λ1(2L)\lambda_{1}(\nabla^2 L). The above theoretical results have been corroborated by an experimental study.Comment: 63 pages. This paper has been accepted for conference proceedings in the 39th International Conference on Machine Learning (ICML), 202

    Do Transformers Parse while Predicting the Masked Word?

    Full text link
    Pre-trained language models have been shown to encode linguistic structures, e.g. dependency and constituency parse trees, in their embeddings while being trained on unsupervised loss functions like masked language modeling. Some doubts have been raised whether the models actually are doing parsing or only some computation weakly correlated with it. We study questions: (a) Is it possible to explicitly describe transformers with realistic embedding dimension, number of heads, etc. that are capable of doing parsing -- or even approximate parsing? (b) Why do pre-trained models capture parsing structure? This paper takes a step toward answering these questions in the context of generative modeling with PCFGs. We show that masked language models like BERT or RoBERTa of moderate sizes can approximately execute the Inside-Outside algorithm for the English PCFG [Marcus et al, 1993]. We also show that the Inside-Outside algorithm is optimal for masked language modeling loss on the PCFG-generated data. We also give a construction of transformers with 5050 layers, 1515 attention heads, and 12751275 dimensional embeddings in average such that using its embeddings it is possible to do constituency parsing with >70%>70\% F1 score on PTB dataset. We conduct probing experiments on models pre-trained on PCFG-generated data to show that this not only allows recovery of approximate parse tree, but also recovers marginal span probabilities computed by the Inside-Outside algorithm, which suggests an implicit bias of masked language modeling towards this algorithm

    Low-Abundance Members of the Firmicutes Facilitate Bioremediation of Soil Impacted by Highly Acidic Mine Drainage From the Malanjkhand Copper Project, India

    Get PDF
    Sulfate- and iron-reducing heterotrophic bacteria represented minor proportion of the indigenous microbial community of highly acidic, oligotrophic acid mine drainage (AMD), but they can be successfully stimulated for in situ bioremediation of an AMD impacted soil (AIS). These anaerobic microorganisms although played central role in sulfate- and metal-removal, they remained inactive in the AIS due to the paucity of organic carbon and extreme acidity of the local environment. The present study investigated the scope for increasing the abundance and activity of inhabitant sulfate- and iron-reducing bacterial populations of an AIS from Malanjkhand Copper Project. An AIS of pH 3.5, high soluble SO42− (7838 mg/l) and Fe (179 mg/l) content was amended with nutrients (cysteine and lactate). Thorough geochemical analysis, 16S rRNA gene amplicon sequencing and qPCR highlighted the intrinsic metabolic abilities of native bacteria in AMD bioremediation. Following 180 days incubation, the nutrient amended AIS showed marked increase in pH (to 6.6) and reduction in soluble -SO42− (95%), -Fe (50%) and other heavy metals. Concomitant to physicochemical changes a vivid shift in microbial community composition was observed. Members of the Firmicutes present as a minor group (1.5% of total community) in AIS emerged as the single most abundant taxon (∼56%) following nutrient amendments. Organisms affiliated to Clostridiaceae, Peptococcaceae, Veillonellaceae, Christensenellaceae, Lachnospiraceae, Bacillaceae, etc. known for their fermentative, iron and sulfate reducing abilities were prevailed in the amended samples. qPCR data corroborated with this change and further revealed an increase in abundance of dissimilatory sulfite reductase gene (dsrB) and specific bacterial taxa. Involvement of these enhanced populations in reductive processes was validated by further enrichments and growth in sulfate- and iron-reducing media. Amplicon sequencing of these enrichments confirmed growth of Firmicutes members and proved their sulfate- and iron-reduction abilities. This study provided a better insight on ecological perspective of Firmicutes members within the AMD impacted sites, particularly their involvement in sulfate- and iron-reduction processes, in situ pH management and bioremediation

    Over-expression of the splice variant of CONSTANS enhances the in vitro synthesis of silver nanoparticles

    No full text
    Eco-friendly biosynthetic approach for silver nanoparticles production using plant extracts is an exciting advancement in bio- nanotechnology and has been successfully attempted in more than 41 plant species. However, an established model plant system for unravelling the biochemical pathways of silver nanoparticle (AgNPs) production is lacking. Here we have shown in Arabidopsis thaliana a genetic model plant and in its misexpressing lines of splice variant CONSTANS (COβ) for the silver nanoparticle biosynthesis in vitro. Employing the biochemical, spectroscopic, Transmission Electron Microscopy (TEM), Raman spectroscopy, Nuclear Magnetic Resonance (NMR) and powder x-rays diffraction (Powder XRD) methods and using selected mutants and over- expressing line of Arabidopsis thaliana involved in sugar homeostasis. Additionally, a comparative analysis of AgNPs synthesis using different transgenic lines of Arabidopsis was explored. Here we have shown that plant extract of COβ and gi-100 (mutant line of GIGANTEA) showed the highest potential of nanoparticle production as comparable to Col-0 and over- expressing line of GIGANTEA (35SGi). Silver nanoparticles production in the Arabidopsis not only opens up a possibility of using molecular genetics tool to understand the biochemical pathways, but also could address the mechanism behind different shapes of AgNPs produced using plant extracts

    Assessment of knowledge, attitude and practice towards occupational health hazards and safety measures among health care personnel working in public health facilities of Bhubaneswar Block, India

    No full text
    Background: Healthcare personnel (HCP) are working in an environment that is known to be one of the most hazardous settings to work in. Occupational diseases are often under-reported; there are many reasons for the gross under notification, one of the main reasons being they are usually less obvious than other occupational accidents and injuries. Aims: To assess the knowledge, attitude and practice of HCP regarding different aspects of occupational health hazard and to find out various correlates for their knowledge, attitude and practice regarding occupational hazards and safety measures. Methods and Material: It was a descriptive cross-sectional study which was undertaken in public health facilities of Bhubaneswar Block, Odisha. The study was conducted for a period of one year. One hundred seventy two health care providers (both medical and paramedical with a minimum experience of six months) were included. Statistical analysis used: Descriptive statistics were used and Pearson chi-square test as the test of significance; taking a p value of< 0.05 as statistically significant. Results: Mean age of the respondents is 38.44 years with a standard deviation of 12.8
    corecore